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Abstract 

Define the non-overlapping return time of a random process to be the 
number of blocks that we wait before a particular block reappears. 
We prove a Central Limit Theorem based on these return times. This 
result has applications to entropy estimation, and to the problem of 
determining if digits have come from an independent equidistributed 
sequence. In the case of an equidistributed sequence, we use an argu- 
ment based on negative association to prove convergence under weaker 
conditions. 
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tive association, return time 

Mathematics Subject Classification: Primary 94A17; Secondary 60F05; 62H20. 

1 Introduction and main theorem 



1.1 Statement of Problem 

Given a sample (Zi, . . . Z^) from a random process taking values in an al- 
phabet A, we would like to estimate the entropy of the process. In general, 
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this is a hard problem, though if the process is assumed to be independent 
or stationary, some progress can be made. 

In particular, given a sequence of binary bits, determining whether the bits 
were generated by an independent equidistributed process has applications 
to problems in cryptography and number theory, as described in Section 

Our approach is as follows: we first partition the sample of {Zi) into blocks 
of size £. That is, writing for {Za, Za+i, . . . , Zf,), we define block random 
variables Xj = Z^^f_^y^^, so each Xj G A^. Then, given the first k blocks 
Xi, . . . , Xk, we count how long it takes for these blocks to reappear. 

Definition 1.1 (Non-overlapping return time) For a given k, define ran- 
dom variable 

Sj = mm{t > 1 : Xj+t = Xj}, for j = 1,..., k, 
to be the return time of the jth block. 

It appears that this definition dates back to Maurer ^H] • The main result of 
this paper is that if the number and size of blocks grow appropriately, then 
the Sj satisfy a Central Limit Theorem: 

Theorem 1.2 Suppose that (Zi) is an independent identically distributed 
finite alphabet process with entropy H . Write gmax < 1 for the maximum 
probability of any symbol. If, as block length I oo, the number of blocks 
k{i) oo in such a way that lim^^oo ^{iY^'^iqf^^^ = 0, then 

v/A:(£)7rV6 

Remark 1.3 

1. Here — ^ denotes convergence in distribution, and'y is Euler's constant 
7 = 0.577216.... 

2. Note that to fit in with conventions in information theory, the entropy 
H is calculated using logarithms to base 2. If entropy were calculated 
using natural logarithms, the log 2 term could be omitted. 
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3. For equidistributed processes, where each symbol occurs with probabil- 
ity q, Proposition VJ. 5\ shows that we can relax the assumption on the 
rate of convergence of k{i), to only require that k{i)iq^ 0. In- 
deed, simulations in Section^ suggest that a weaker condition, namely 
k{£)£2~^^ — > 0, may be sufficient to ensure convergence for all finite 
alphabet independent processes with entropy H. The faster rate of con- 
vergence in the case of equidistribution is useful, since we will often test 
a null hypothesis of equidistribution. 



Corollary 1.4 Under the conditions of Theorem M.H^ above, the estimator 

£\og2^k{i)7ry6 V ^^ 

so is asymptotically normal and consistently estimates the entropy. 



If the process is independent and equidistributed on a finite alphabet, each 
block occurs at each time with probability p = \A\^'''. Hence each of the Sj 
are geometric random variables, with F{Sj = r) = p{l —pY~^ (we call this a 
Geom (p) variable). In general, if the process is independent and identically 
distributed (IID) with entropy H, it satisfies the Asymptotic Equipartition 
Property (AEP), so that asymptotically there are ~ 2^^ blocks of length i 
which appear, each with probability ^ 2~^^, so conditioned on the value of 
Xj, Sj is a geometric random variable. However, even though the symbols Zj 
are independent, the return times 5*^ are dependent, so we need to understand 
the dependence structure to prove Theorem 11.21 

In Section [T!^ we describe some results concerning similar return time defini- 
tions made by other authors. Section lOl describes possible two applications 
of these results. In Section |2l we prove Theorem II. 2| using an argument 
based on asymptotic independence. We transform into a similar problem, by 
eliminating the possibility of early matches. Section IHl gives a proof under 
weaker conditions for equidistributed random variables, by using negative 
association. Section 0] contains the results of some simulations. 

In future work, we hope to extend these results to general stationary pro- 
cesses, under a suitable mixing condition, and to prove similar results for 
other definitions of match length. 
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1.2 Other similar definitions 

We briefly describe some other work concerning similar quantities. This is 
by no means an exhaustive list, but merely gives a flavour of some of the 
alternative approaches which exist. 

1. Definition 1.5 (Overlapping return time) Define random variables 
Tk to be the time before the block is next seen: 

n = min{t > 1 : = }. 

Kac's Lemma IHj shows that for any stationary ergodic process E[Tfc|Z^ = 
Zi] = 1/F{Z^ = Zi). This was developed by Kim [T2], who gave the 
limiting behaviour of E[TfcP(Zf)] for independent processes, and by 
Wyner who showed that 

Theorem 1.6 If (Zi) is a Markov chain then 

lim '°^-%"^ ~iV(0.1). 

Here V is the information variance, lim^^oo Var(— log2 P( 2'" ))/■«.. For 
independent processes: 

V = Var(-log2P(Zi)) = E, P(^i = ^.)(-log2P(^i = z,) - H)\ 

See also Corollary 2 of Kontoyiannis ^^l, who showed that this re- 
sult holds for general stationary (Zj) under explicit mixing conditions. 
Wyner and Ziv |j23j, Ornstein and Weiss Vj\ and Gao [5^ consider sim- 
ilar quantities. 

2. Definition 1.7 (Grassberger prefix) Given n and 1 < i < n define 

RUZ'l) = inf [t : Z^^'-^ ^ Z^''^ for all j ^ i] . 

In words, Ri^n{Z'^) is the length of the shortest string started at position 
i different from all the others of equal length started at j for 1 < j < 
n. This quantity was introduced by Grassberger [7], and studied by 
Kontoyiannis and Suhov [H], Quas [12] and Shields [20], [21], partly it 
allows good entropy estimation for an ergodic process with a suitable 
degree of mixing. For example. Theorem 1 of ^3] shows: 
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Theorem 1.8 If finite alphabet process (Zi) is ergodic with entropy H, 
and satisfies a Doeblin condition then 




= H, almost surely. 



3. Lempel-Ziv coding Another problem with similar features is finding 
the asymptotic behaviour of the number of codewords in the Lempel- 
Ziv parsing (see Section 12.10 of Cover and Thomas ^). Ziv [23] 
made a conjecture concerning the number of codewords. However, Al- 
dous and Shields PP were only able to resolve the problem for IID 
equidistributed processes, and it took careful analysis by Jacquet and 
Szpankowski [U] to extend their results to IID asymmetric processes. 
For example Theorem lA of ^ shows that 

Theorem 1.9 Given an binary asymmetric ( P(2'i = 0) 7^ l/2j IID 
process, then L^, the total length of the m words in a Lempel-Ziv tree, 
satisfies 

hm ^^£^r^N{QM 
m^oo ^Var(L„) 

where KLm = {m \og2 m)/ H + 0{m) , Var(Lm) = Vm\og2m/H^ + 
0{m). 

Notice that these Theorems II. (i| 11.81 and 11.91 differ in character from our 
Theorem 11.21 For example. Theorem 11.81 proves a law of large numbers for 
the Grassberger prefixes, showing that a statistic based on them acts as an 
entropy estimator. However, it does not tell us the rate of convergence of 
the estimator. Similarly, although Theorems 11.61 and 11.91 give asymptotic 
normality, they both refer to statistics calculated with respect to one fixed 
point. It is possible that this fixed point could be unrepresentative, and so 
our result is stronger in the sense that it averages over a number of different 
starting points. 

1.3 Applications 

We briefly describe two applications of these results. 
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1. (Cryptography) The problem of deciding whether binary bits (Zj) 
were generated by an IID equidistributed process arises in cryptogra- 
phy. Bits generated in this way can be used as a perfectly secure one- 
time pad to transmit a message [Yi). This system is secure in the sense 
that the transmitted bits Yi © Zi are independent of the message, so 
no inference about Y^ can be made from them. Equivalently Shannon's 
Second Coding Theorem (see for example Theorem 8.7.1 of implies 
that the binary symmetric channel with error probability p = 1/2 has 
capacity C = 0. If the (Zj) were not independent and equidistributed, 
given large enough n, it may be possible to infer properties of the (Zi) 
and perhaps read the message (Yi). 

2. (Number Theory) Recall that a number is said to be normal to base 
b if the limiting proportion of each digit in its base b expansion is 1/6. A 
number which is normal to all bases b is referred to as normal. Ergodic 
theory shows that almost all numbers are normal, but it is hard to prove 
that any particular number has this property. For example. Bailey and 
Crandall ^ prove that a particular class of numbers (including the 
so-called Stoneham and Korobov numbers) has the normal property. 
On the other hand, in the same paper j2] they discuss the fact that 
constants including vr, e. In 2 and ({3) are not known to be normal. 
Weisstein gives a review of results concerning normal numbers. 
An informal statement of the property of normality to base b is that 
the digits of the number 'look as if they were generated by an IID 
equidistributed process', which we hope to be able to test. 

Kim ^] gave computational results concerning the speed of convergence of 
estimators based on overlapping matches, hoping to detect processes which 
are not Bernoulli. Similarly Bradley and Suhov [3j used theoretical results 
concerning the Grassberger prefixes (see Definition II. 7j) to consider the nor- 
mality of constants such as vr, e and 7. We give some computational results 
in Section m 
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2 Proof of Main Theorem 



2.1 Avoiding early matches 

The difficulty in analysing the dependence structure of the random variables 
Si introduced in Definition 11.11 is that 'early matches' can occur at i. That 
is, it may be that 5*^ < k — i. The possibility of early matches leads to a 
complicated situation of case splitting, according to where such early matches 
occur. 

To avoid this, we introduce a very similar sequence of random variables (Rj) 
in Definition 12.11 below. We use two main ideas to prove Theorem 11.21 the 
Central Limit Theorem for the Si. 

First Si = Ri + {k — i), unless there has been an early match. By controlling 
the probability of an early match, we show that a suitably scaled version of 
log Si — log Ri tends to zero in probability, and so a limit law for the 
Ri passes over to a limit law for the Si. The formal statement is given in 
Lemma 12.21 below. 

Then, in Proposition 12.61 we establish a Central Limit Theorem for the Ri, 
using explicit bounds on conditional probabilities, which show that the vari- 
ables are asymptotically independent. 

Definition 2.1 Given a realisation of Xi, . . . , Xk, we define D, the set of 
positions which do not see an early match. That is 

D = {t:Xi^Xjforj = i + l,...,k}. 

For each i & D, we take bi = Xi. For each i ^ D, we pick hi to he a random 
element, chosen uniformly from the set of elements not yet seen - that is, 
from MJi^i- 

Define random variahles Rj to he the time waited hetween time k and the first 
appearance of value hj . 

Rj = mm{t > 1 : Xk+t = hj}, for j = 1, . . . , k. 

Lemma 2.2 Suppose that (Zi) is an independent identically distributed finite 
alphabet process with entropy H . If, as block length £ — > oo, the number 



7 



of blocks k{i) — > oo in such a way that lim^^^kief/^eqi^^ = 0, then the 
difference term 

Vk 

tends to zero in probability. 

Proof The key observation is that Si = Ri + {k — i), unless i e D'^. Further, 
i < Si < Ri + {k — i). This means that we can decompose 



< F(yk'/'—>6] 



since R ~ Geom (p) has El/i? = — plogp/ {1— p) — 0{qf^^£). 
Similarly, we know that 



■ii= 

Vk ^ 



< 



^(^XjE(iogit:,)P(ieL'^) 

since P(i G D^) = l-W{ieD)< {l - (l ~ g^^x)'') < ^5ma^> independently 
of and Elogiii = 0(£). □ 

2.2 Mean and variance of \og Ri 

We first find the leading order terms in Elogit!j and Varlogi?j for all i. 
We use the following Lemma, the simplest form of the Euler-Maclaurin sum 
formula. 
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Lemma 2.3 For any differentiable function f such that f{x) — > as x 

' 1, 1 '"^ 



' — • i-oo 

i=i -'^ 



\nx)\dx. 



(1) 



Proof Note that (integrating by parts) for all differentiable functions /, and 
for all a: 

J^^' f{x)dx = \ if (a) + f{a + 1)) - fix) (^x-a-^y 



which implies that 



a+l 



f{x)dx--{f{a) + f{a + l))=R, 



where \R\ < {J^ \f'{x)\dx)/2. Summing such results from a = 1 to a = oo, 
we deduce that Equation ((H) holds. 

Lemma 2.4 For R ~ Geom (p) : 

M:=E{\ogR) 

(T^(p) := Var(logi?) = \- 0{p log p) 

6 



-7 - logp + 0{p) 



IT 



E\\og R - i^{p)f < K, 
where, as before, 7 is Euler's constant, and K is a finite constant. 



□ 

(2) 
(3) 
(4) 



Proof Take c = — log(l ~ p) and f{x) = e~^^logx, and use the fact that 
|/'(a;)| = |e~^^/x — ce~'^^loga:;| < \e~'^^/x\ + c|e~'^'^' loga;|. We deduce by 
Equation (0) that the difference between the integral / and sum S: 



\S-I\ 



i=l 



e '^^ log xdx 



1 

< - 



00 ^-CX 



dx + - e '^^ log xdx 
2Ji X 2 A 



2 
cl. 



-e ''"'log a; 



+ c / e log xdx 
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Using the fact (writing r(0, ■) for the incomplete gamma function) that / = 
r(0, c)/c = (—7 — logc + c + 0(c^)), we deduce that 



Elogi^ = Vp(l-p)^-Mogx= ^ ^^°'^\ l + e) = (-7-logp) + C)(p). 

^-^ 1 — p c 

1=1 

In the same way, for /(x) = e~'^'^(logx)^, the derivative = \2e~'^^\ogx/x- 

ce~'^^(logx)^| < |2e~'^^logx/x| + c|e~'^^(logx)^|. As above 



1^-/1 = 



00 pQQ 

e"''* log i — / e~'^^ log 

1=1 



< c/. 



Using the fact that / = 7r^/(6c) + (—7 + log c)^/c+ 1 — 0(c), we deduce that 
Elogi?2 _ ^(p)2 = 7^76 + 0{p\ogp). 

Finally, we bound the centred absolute third moment E| logi? — yu(p)p. We 
partition the real line into 3 intervals; firstly the set Ai = {x : \ \ogx—fi{p)\ < 
1}, secondly A2 = {x : logx — /x(p) > 1}, and thirdly A3 = {x : logx — /i(p) < 
-1}. We define integrals Ki = E|logii: - M\%x e A) for i = 1,2,3. 
Clearly < 1. By Chernoff's bound, for i > 1: 



F{logR - ii{p)>t) < 



ex.p{s{t + n{p))) 



(2 - v) Iv^ 

^ exp(-2i) \ ^'/^ ^ < 2exp(27) exp(-2t), 

exp(-27 + C)(p))/p2 

taking s = 2. Hence K2 = J^3t'^F{\ogR — iJ,{p) > t)dt < 4. In a similar 
fashion, 

P(logi? - i^{p) < -t) = F{R < exp(-7 - t)) < (1 - exp(-e-T-*)), 
so K3 < 4. Overall then, we can take K — 9. □ 



2.3 Asymptotic independence 

Next we prove a lemma that shows that the Ri are approximately indepen- 
dent, giving explicit bounds on the difference between the joint probability 
distribution and the product of the marginals. 
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Lemma 2.5 Suppose that {Z.j) is an independent identically distributed finite 
alphabet process with entropy H. For (Ri) as defined in Definition \2. il for 
any s, m, and for any a = (ai, . . . flm-i); writing R = . . . Rm-i): 



, s-l 

Pr 



1-S* 

where S* = pi + . . . + pm-i ■ 



< HRm > s|R = a) <{l-p 



s—m 
mj 1 



Proof Given the values of Ri, . . . Rm-i, we can write down an explicit 
expression for the distribution of Rm- 



P(i?„>s|R = a) = J]P(Xfe+i^6^|R = a) (5) 



i=l 



We consider this product (0) term by term, for each value of i. If for some 
j < m — 1, the aj = i, then X^+j = bj, so automatically X^+j 7^ bm, and so 
the contribution from that i to the product (0) is 1. 

Otherwise, if aj 7^ i for all j < m — 1, then 

¥{Xk+i ^ bm\R = a) = 1 - F{Xk+i = bjR = a) = 1 - : ^™ 



Si 



where Si = YlT=i Pj^i^^j > 0- This is a decreasing function in S^. 



■I] 



It is clear that the product (0) is maximised when the first (m — 1) values of Oj 
occur in the first m — 1 places, that is when {ai, . . . , 0^-1} = {l,...,m — 1}. 
In this case the value of (0) becomes (1 —pmY~"^- 

Similarly, the product (0) is minimised when Si is maximised for each i, that 
is when aj > s for each j. In that case. Si = Yl]j^=i Pj ~ ^* each i, and 
the product becomes (1 —pm/{l — S*)Y^^. □ 

Proposition 2.6 Suppose that {Zi) is an independent identically distributed 
finite alphabet process with entropy H . For {Ri) as defined in Definition \2.1\ 

^ f ^0 \ f^0 '^. \ f . \ 

E exp 2^ log Rjj - E exp 2^ log i?, j E exp ^-^ log j 

^sO{ik{iy/W^J. 
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Proof Adapting Equation (22) of Newman [THI, for any complex continu- 
ously differentiable functions / and g, and for random variables U and V: 



CovifiU),g{V)) 



CO poo 



f'{u)g'{v)Huy{u, v)dudv, 



(6) 



oo •/ —oo 



where 



Hu,viu,v) = F{U >u,V >v) ~F{U >u)F{V >v) 
= -¥{U >u,V <v)+¥{U >u)¥{V <v) 

We take U = logRm and V = ^'^S-^i- We will find non-negative 

functions h_ and h+ such that —h_{u,v) < Huy{u,v) < h+{u,v) for all u 
and V. Since U, V take non- negative values only, if m < or f < then 
Hjjy{u,v) = 0. Hence, taking f{t) = g{t) = exp{i6t/^/k), Equation © 
simplifies to 



Gov I exp 



i9U 
7^ 



, exp 



i9V 



< 



< 



J 
k 



^0 

oo POO 



ieu\ 

exp I ^ j exp 

\Huy{u, v)\dvdu 



iev\ 
7t) 



\ Huy[u, v)dudv 



^0 

oo roo 



^0 



h_{u, v)dvdu + 



oo roo 



JO 



h^{u, v)dvdiL^ 



(7) 



We know that F{U > u) = EU, f^F{V > v) = EV, and f/^F{V < 



v)dv + ^°°r{y > v)dv = E\V - fi\. Further, we can evaluate 

r(o,-iog(i-p)) 



fip) 



{i-py-'du 



p 



Since f{p) has an increasing, but negative gradient, with —f'{p) < l/(— (1 — 
p) log(l — p)) < l/(p(l — p)), we know that for p < q: f{p) — f{q) < 
(g —p)/{p{l — p))- This means that 



Pn 



Pn 



du = 0{k{l)qi 



(8) 



We rearrange Lemma 12.51 and sum over values of a such that ^ log aj > v 
or loga^ < V. 
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1. For V > EiV, we find that 

ve" — 1 / 1 Pn 



h4u,v) < ((l-p^)--^-(l--£I^) \FiV>v) 

and similarly can take h+{u,v) < (1 — PmY^'^F^U > u)F(y > v). 

Thus, by (jHI), over this region J h_{u,v)dudv < 0{k{(^)q^^^)K\V — 
^1 = 0{k{if/^qU and / h+{u,v)dudv < (1 -p„)i-"^(E?7)E|\/ = 

2. FoTv< EV, we find that 

X e"-l 

1-S 

and similarly can take h-{u,v) < (1 — pm)"'^^™P(f/ > u)¥{V > v). 

As before the integrals satisfy J h+{u,v)dudv = 0{k{i)^^'^q^^^) and 
Jh4u,v)dudv = 0miY/V^J. 

Substituting in Equation ((7j), the result follows. □ 



2.4 Completing the proof of Theorem 11.21 

The Lyapunov Central Limit Theorem (see for example Theorem 4.9 of [TH] ) 
implies that for independent Yi, . . . , Y^, where Yi has mean /Xj, variance 
and finite centred absolute third moment rrii = K\Yi — , if 



Ek 
i=l 

then 



(9) 



iV(0,l) (10) 



Proof of Theorem 11.21 Define a sequence of independent random variables 
(Tj) with Tj ~ Ri then Lemma 12.41 (in particular Equations (jH)) and 
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shows that the Lyapunov condition Q holds for logTj, so that given values 
Pi, ■■■,Pk, 

EUMT)z1^ N{0,v), (11) 
V 

where v = ^ (^^i=i ^^^(^°S-^«)^) ~ vr^/G + O(plogp), and (by the Law of 
Large Numbers) Yli=if^iPi) ~^ ^(~7 + -f^^log2). 
By repeated use of Proposition 12.61 then: 

^5^(logi?, -/i(p,))j -JjEexp (^A_(logi?,. -/i(p,))j 

is 0(£/c(£)^''^g^a^x), so if this quantity tends to zero, then the Central Limit 
Theorem for logTj, Equation (fTT|) carries over to give a Central Limit The- 
orem for log □ 



3 Equidistribution and negative association 

In the case of equidistributed random variables, we can establish a Central 
Limit Theorem under weaker conditions on k{t), using negative association. 
This property captures the sense of dependence in which one random variable 
being large forces the others to be smaller. Formally: 

Definition 3.1 yl collection of real-valued random variables (Uk) is nega- 
tively associated (NA) if, for all increasing functions fi and f2, the covari- 
ance 

Cov(/i(?7, -.IE A,), f2iUj : J G A2)) < 0, (12) 
where fi and /2 take arguments in disjoint sets of indices A\ and A2. 

The negative association property proves useful in many situations, not least 
because Newman jinj shows that the Central Limit Theorem holds for NA 
sequences of random variables. Further, if {Uk) forms an NA sequence, then 
for any increasing function /, the {f{Uk)) are also an NA sequence. 

Proposition 3.2 Suppose that {Zi) is an independent equidistributed process 
with finite alphabet A. The (Ri) introduced in Definition \2. 1\ are negatively 
associated. 
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Proof Given the ordering Rr{i) < Rt{2) < • • • < -Rt(a,), the actual values 
satisfy -Rr(i) ~ Geom(fcp), Rr(2) — Rt{i) ~ Geom((/c — l)p) (independently) 
and so on. That is, if we define Wi independent with Wi ~ Geom {{k+l—i)p), 
ioT i = 1, . . . , k, and define Uj = J2i=i then Rr(i) = Ui, or Ri = Ur-i(i). 

As in the proof of Theorem 3.4 of Hu jH], it suffices to show that Equation 
fll2|) holds for symmetric functions /i and /2, with Ai = {l,...,p} and 
A2 = {p + 1, . . . ,k}. Given the functions fi and /2, define 

fl{ip+i,...,ik) = E(/2(t/ip+i,...,t/,j) . 

If any index ii increases (for / G {1, . . . ,p}), then (since Ui < U2 < ■ ■ ■ < Uk) 
so does f/j, , and hence (since /i is increasing) so does /j*. That is, /j* is an 
increasing function of {ii, . . . ,'ip}. Similarly /2 is also increasing. 

For any permutation r, we define the increasing functions 

9l{r) = /i*(r-i(l), . . . , T-\p)) and g;{T) = n{r-\p + 1), . . . , r~\k)). 

Theorem 2.11 of Joag-Dev and Proschan TU^ gives that the uniform distri- 
bution on the set of permutations is negatively associated, so that 

n9l{r)gl{T)) < ng{{rmmr)). (13) 

Now E^^(r) = E/*(r-i(l), . . . , T-\p)) = E/i(f/,-i(i), . . . , [/,-!(,)) 
= E/i(i?i, . . . , Rp). Similarly, Eg^T) = E/2(i?p+i, . . . Rk) and E(7i*(r)(72*(r) = 
E/i(i?i, . . . , Rp)f2{Rp+i, . . . Rk), so Equation (fT^ implies Equation (fT^ as 
required. □ 

Lemma 3.3 Suppose that (Zi) is an independent equidistributed process with 
finite alphabet A. If as block length i — >■ 00, the number of blocks k{i) — >■ 00 
in such a way that k{i)£\A\~^ — 0, then the difference term 

(Eli(logi?.-log5i)) 
tends to zero in probability. 



15 



Proof As before, Si = Ri + {k — i), unless i G D'^. Further, 1 < S'j < 
Ri + {k — i). This means that we can decompose 

I log log Si I < \\og R, - \og{R, + {k-i))\ + \\og{Ri + {k-i))- log Si 

< ^ + \og{Ri + k)I{teD'') 
Ki 

Since the Ri are negatively associated, so are —k/Ri, so that by Cauchy- 
Schwarz, writing p = |^|^^, 



^(j2i^ogR,-\ogS,)j 



< 2 J2 k'^^ + 2 5^ (e \og{R,y + k'E^] 

i=l * i6-D<: ^ * ^ 

< 2P(p2 + 0{p^)) + 2kp (0((- logp)^) + A;2(p2 + 0{j)^))) 

This follows since El/i?^ = pLi2(p)/(l — p) = + 0{p^), where Li2 is the 
dilogarithm function, and since for any i, 

¥{i e D^) <F{1 e D") = e D) = 1- {1- pf < pk, 

independently of Ri. The lemma follows on dividing by k. □ 

Lemma 3.4 Suppose that {Zi) is an independent equidistributed process with 
finite alphabet A. For any i ^ j , the Ri defined in Definition \2. 1\ satisfy 

\Gom{R,,R,)\ = 0{1\A\-'). 

Proof From the negative association proved in Proposition 13.21 we know 
that the covariance is negative, so we need only bound it from below. For 
any x, we know that, writing p = \ A\~^. 



F{Rj = y\Ri = x) 
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This means that 
E{log Rj\Ri = x) 



p f 1 — 2p\^^ \ p fl — 2p^^ 



In Equation ()14|) each summand is positive, so we can replace logy > log{y — 
x), and write z = y — x to obtain that Equation ()14|) is greater than 



(t¥)"(E(..-.r--,(\^)>...: 



l-2p Y-' ( l-2p ( p 

^ ~ v J V I — p \i — p 

Overall then, using the notation of Lemma 12.41 
mogRilog Rj 



OO ^ \ oo 



X? 



l-2p p 

Kp) - M 



1 — p \1 — p 



1 — p \1 — p^ 

Expanding this using Lemma 12.41 we deduce that Cov(logi?j, logi?j) > 

— plogp. Indeed, asymptotically, Cov(log-Rj, log-Rj) > logp)/2 + 7). 
□ 
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We can now deduce the Central Limit Theorem for logSf. 



Proposition 3.5 Suppose that {Z.j) is an independent equidistrihuted finite 
alphabet process with entropy H . If, as block length i ^ oo, the number of 
blocks k{i) ^ CO in such a way that k{i)i\A\^^ — > 0, then 



^k{£)7TyQ 



iV(0,l). 



Proof By Lemma we need only prove the corresponding result for logi?j. 
By Proposition ISI21 the Ri are negatively associated, and hence so are logi?,. 
Since H{u,v) < for all u, v, adapting Equation ((Tj) as in Newman [TB] , 
gives the following result: if Ui, . . . ,Uk are negatively associated then: 



ie 



Vk'H 

7 = 1 



exp 



This means that, taking /i = IH log 2—7, and for the characteristic function 
of the iV(0,t;): 



iQ 



Eexp [ — ^(logi?i - /i) 



< 



Eexp I -j^ ^(logi?j - /i) j - JjEexp ( (logi?^ - yu) 



JjEexp (log J -^{Q) 



iQ 



Equation ^ bounds the first term by kB'^\Qov[Ri,R^)\ = 0{k{i)iA-^'), so 
we can control that term. We control the second term by using the Lyapunov 
Central Limit Theorem, Equation (jlOj) . □ 



4 Computational results 

We present the results of some calculations, based on simulations with ran- 
dom number generators, and on the decimal digits of well-known constants. 
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In each case, we calculate the value of the statistic 



E-=i(logg^-^^log2 + 7) 
^A;(£)7r76 

We present the results plotted as a quantile-quantile plot using R - the line 
connects the upper and lower quartiles. If the distribution of the statistic 
were exactly A^(0, 1), we would see the majority of the points lying very close 
to the line y = x. 

To produce Figures 14.11 and 14.21 we performed 500 trials on simulated data. 
Figure 14.31 is based on breaking the first 20 million decimal digits of vr and 
e into 50 blocks of 400,000 digits each. We used the program PiFast, which 
is freely downloadable from the Internet [Hj, and which can easily calculate 
tens of million digits of constants such as vr and e. 

In each case, the points do appear to lie on a straight line, though the sample 
variance is slightly smaller than expected. This could be remedied by dividing 
by the square root of the true variance 

/ m \ 

Var K]]log5, = A;(£)7rV6 + k{i){k{i) - l)Cov(5„ Sj) < k{iy/6. 

In order to do this, we would require an expansion, rather than simply an 
approximation, for the covariance in Lemma 13.41 (since the proof of Lemma 
13.31 shows that sums log Rt and log Si have the same variance, asymptoti- 
cally). Numerical calculation suggests that Cov{Ri, Rj) ~ (plogp)/4. Of 
course. Lemma 13.41 only holds for equidistributed processes. However, in 
general the Asymptotic Equipartition Property suggests that we can assume 
that Cov{Ri,Rj) ~ {2~"^Hi\og2)/A. 
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Figure 4.1: QQ plots: equidistributed binary data; (a) k = 250, i = 10 (b) 
k = 1000, e = 13. 




Figure 4.2: QQ plots: asymmetric {F{Zi = 0) = 0.75) binary data; (a) 
k = 250, £ = 10 (b) A; = 1000, £ = 13. 
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Figure 4.3: QQ plots: decimal data, k — 1000, ^ = 4; (a) digits of 
digits of e. 
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